STA 210 - Spring 2022
Dr. Mine Çetinkaya-Rundel
The data set contains the “Tomatometer” score (critics) and audience score (audience) for 146 movies rated on Rotten Tomatoes.
We want to fit a line to describe the relationship between the critics score and audience score.
The response, Y, is the variable describing the outcome of interest.
The predictor, X, is the variable we use to help understand the variability in the response.
A regression model is a function that describes the relationship between the response, \(Y\), and the predictor, \(X\).
\[\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]
\[\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{purple}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]
\[\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \color{blue}{\textbf{Error}} \\[5pt] &= \color{purple}{\mathbf{f(X)}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt] &= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt] \end{aligned}\]
When we have a quantitative response, \(Y\), and a single quantitative predictor, \(X\), we can use a Simple linear regression model to describe the relationship between \(Y\) and \(X\). \[\begin{aligned} Y &= \mathbf{\beta_0 + \beta_1 X} + \epsilon \end{aligned}\]
\[\boldsymbol{\beta}_1: \text{Slope} \hspace{20mm} \boldsymbol{\beta}_0: \text{Intercept}\]
\[\Large{\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X}\]
\[\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}\]
\[e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i\]
\[e^2_1 + e^2_2 + \dots + e^2_n\]
\[\large{\hat{\beta}_1 = r \frac{s_Y}{s_X}}\]
\[ \begin{aligned} s_X &= 30.169 \\ s_Y &= 20.024 \\ r &= 0.781 \end{aligned} \]
\[ \begin{aligned} \hat{\beta}_1 &= 0.781 \times \frac{20.024}{30.169} \\ &= 0.518\end{aligned} \]
\[\large{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}}\]
\[\begin{aligned} &\bar{x} = 60.850 \\ &\bar{y} = 63.877 \\ &\hat{\beta}_1 = 0.518 \end{aligned}\]
\[ \begin{aligned}\hat{\beta}_0 &= 63.877 - 0.518 \times 60.850 \\ &= 32.296 \end{aligned} \]
✅ Interpret the intercept if - the predictor can feasibly take values equal to or near zero. - there are values near zero in the data.
🛑 Otherwise, don’t interpret the intercept!